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Abstract. A physical law is represented by the probability distribution of a measured variable. The prob- 
ability density is described by measured data using an estimator whose kernel is the instrument scattering 
function. The experimental information and data redundancy are defined in terms of information en- 
tropy. The model cost function, comprised of data redundancy and estimation error, is minimized by the 
creation-annihilation process. 

PACS. 06.20.DK Measurement and error theory - 02.50.+S Probability theory, stochastic processes, and 
statistics - 89.70.+C Information science 

1 Introduction puter, especially for data acquisition systems in industrial 

environments. [2J Since measurements are always subject 

Quantitative physical explorations of natural phenomena ^^^^^^ influences, [1] a statistical approach to modeling 

involve three basic tasks: performing experiments, pro- ^^^^^^ jj^^^ consider the probability distribution as 

cessing data, and modeling physical laws. [J The leading ^ g^^^^^j ^^-^ modeling of a physical law. The first 

trend in the development of modern experimental systems ^^^p modeling is an estimation of probability density 

is to automatize the first two tasks, while the solution of f^j^ction (PDF) from experimental data. The most widely 

the critical problem of modeling is still left to intuition, applicable is non-parametric estimation as it requires no 

In the recent literature there aheady appear attempts to ^ assumptions about PDF.m 
program as well the modeling for execution on a com- 
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From the estimated PDF the experimental physical error and complexity. , 8J The theory of statistics offers well 

law can be extracted using the conditional average.^ This elaborated methods for the estimation of the error. [3l4l6] 

average represents a non-parametric regression which can while the description of the model complexity is physi- 

be carried out simultaneously with the data acquisition callv less well established. |ll|12j For this purpose the mea- 

by computer. The structure of the corresponding infor- sure of algorithmic complexity is applicable. |7l8l9j but this 

mation processing system resembles a structure of the ra- measure is derived from the program code that determines 

dial basis function neural network. [2|4|5j In addition to the average model performance. In the physical literature 

non-parametric regression, several other paradigms from the complexity is usually considered as an intrinsic prop- 

the fields of artificial neural networks, such as multilayer erty of the phenomenon and should therefore be expressed 

perceptrons, can be interpreted as automatic modelers directly in terms of measured values. [TT] With this aim 

of physical laws. [2|4)6| Various algorithms for adapting we define in the next section the experimental informa- 

a selected model to experimental data have already been tion provided by measurements with an instrument of lim- 

described. [21416] but the development of fundamental prin- ited accuracy. It turns out that experimental information 

ciples for a specification of the model structure is still a is useful for the description of the excessive complexity 

subject of current research. The problems stem from of data which can be utilized for the introduction of the 

a significant contrast between the complexity of experi- model cost function. 

mental data and the structure of physical laws. The in- In order to avoid problems with joining the error and 

formation about the phenomenon explored is generally complexity of the model in the cost function, it is conve- 

increased with the number of experimental data; hence nient to express both terms by a single quantity.^ For 

instrumental science and technology tend to develop elec- this purpose we employ the entropy of information. [12113] 

tronic devices with ever greater storage capacity. Contrary since it is non-dimensional and provides a common basis 

to this, the most prominent property of a physical law is its for formulation of error and complexity, 
simplicity. [Ij At present it is still not clear how an elec- 
tronic modeler could automatically and optimally com- 

2 Experimental information and redundancy 

press the overwhelming experimental data into a simple 

law, although the theory of algorithmic information has data 

already prepared some fundamentals for the treatment of 

At the definition of the experimental information we con- 

this problem. [SMTU] 

sider a scalar- valued variable X since the generalization to 

A simple model of physical law can be obtained by a multivariate case is straightforward. For this variable we 

minimizing a cost bmction which is composed of model select a bounded continuous sample space Sx = {—L,L), 
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where 2L is the span of the instrument appUed. We assume of the instrumental scattering and interpretation of mea- 

that an arbitrary number, say iV, of statisticaUy indepen- sured data. 

dent measurements has yielded the samples xi, . . . ,xn ■ Ai-j. ^■ ^ ^ • fii, c c 

J- ij J iv ^ strict mathematical nalysis or the performance of 

The non-parametric estimator of PDF is then expressed • m-M^ ^ i ^j. ^ j i j 

^ ^ various ruV estimators has attracted much attention and 

by the sample average l2T3l • , j ur 4-- j-t,- u- j-j-u • 

JT- o I— I—! more advanced pubhcations on this subject the mforma- 

, Af .... ^^^^.j^ 

^{x) — — S(x — X ) (1) tioii entropy is utilized as a common analytical tool. [15116117] 



n— 1 

However, an exhaustive mathematical analysis of estima- 

This estimator, though unbiased, is not consistent. [2] As 

tor performance apparas too cumbersome for experimen- 

Parzen has shown, [3ll4j it can be made consistent by using 

talists which often want to estimate the performance of 

as a kernel a smooth approximation of the delta function, 

their estimator already during execution of experiments. 

such as the Gaussian 

Consequently we still utilize the kernel estimator but con- 



l(x - Xn, cr) = ^ exp 



2a 

with some standard deviation a dependent on N. Parzen's 



(2) trary to Parzen take into account at the description of 
the kernel the scattering of data caused by measurement 
procedure and describe the estimator performance by the 



estimator 

entrop of information. 

1 

j^^i An acqusition of a measured datum can generally be 

is therefore biased, [5] but the bias asymptotically van- considered as a measurement process in which the mea- 

ishes if (j{N) properly decreases towards with increas- sured object generates the instrument output x. Common 

ing A^. [3ll4j The samples xi,. . . ,xn themselves are con- to all meassurements is that there exists an agreement 

venient parameters of the PDF model, but unfortunately by which the units for the observed variable are selected, 

their number must increase without limit and the smooth- Hence we assume that a set of objects which represents 

ing parameter cr^N) is introduced arbitrarily. 4J Since mea- the units {Uk, fc = 1, . . .} is available. Using these objects 

surements are subject to instrumental scattering, the re- we can perform a calibration of our instrument. The next 

quirement that cr(N) vanishes is in conflict with a cor- common property of measurements is that the outputs 

rect physical presentation of measured quantities. pi Con- of instruments are fluctuating even when callibration is 

sequently, we want to replace Parzen's method by a h- performed. We assume that this property can be charac- 

nite procedure, which would be more in tune with prop- terized by determining the density of the probability dis- 

erties of experiments and would from the very beginning tribution of the instrument output at each selected unit, 

incorporate the measurement inaccuracy in the PDF es- We denote the density of this distribution by ■(/'(x|C/fc). Its 

timator. For this purpose we turn first to the description mean value Uk — E[x\Uk] and standard deviation cr are 
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usually used to denote the fc-th element of the scale and {ipix — Xi,a);i — 1, . . . ,N}. The corresponding mixture 
scattering of instrument output at the calibration. For the model [3] 



1 ^ 

fN{x) = — ^ipix - Xi,a), (5) 



sake of simplicity we further consider the cases where the 
output scattering does not depend on the position on the 
scale and can be expressed as a function of a; - Uk and a resembles Parzen's estimator Eq.|3l but here cr is a con- 



alone: ip{x\Uk) = ipix - Uk,a-). Most commonly a Gaus- stant given by instrument calibration that is independent 

sian scattering function ij{x - Uk, cr) = g(a; - Uk, cr) is ob- of N. Therefore we also omit in the following text cr from 

served. We can generally repeat the calibration procedure ^- If the true probability distribution of variable X is 

with a selected unit Uk finite number of times and obtain given, then the general properties of this estimator can 

a statistical set of calibration samples {ip{x - Uk, <T)n]n = be analysed following the methods developed by the other 

1, . . . , Nc}. If the instrument is well cahbated, the scat- authors. [11116117] . However, we rather proceed to the def- 

tering functions obtained in repeated callibrations do not inition of experimental information and demonstation of 

differ essentially. The mean of the scattering function over its appUcability for the estimation of an optimal number of 

the set of samples is then approximately equal to the re- experiments needed for the specification of the PDF. With 

suit obtained by just one callibration: this aim we first describe the indeterminacy of variable X 

^ -/Vc in terms of the entropy of information. [12] For a discrete 
tpix -Uk,a) = -Tj-^ i'ix - Uk, cr)„ « -ipix - Uk, cr)i, 

n=l 

(4) 

In this case we consider the output scattering as a result of 

inherent fluctuations of measurement procedure and the H = — ^^p^logpi. (6) 



random variable that assumes N states with probabilities 
Pi Shannon introduced the entropy of information by |13) 

N 



standard deviation a as the parameter that depends on 
the quality of the instrument. 

Next consider a set of N measurements of variable X 
with the well calibrated instrument which yield the set of 
distributions {^j{x — xi, cr); i 1, . . . , iV} with a standard 
deviation that is practically independent on i. In this case 

we interprete the scattering of mean values a;i,...,a;7vas ^ ~ ~ J ^ ^"^^ pix) ^'^^ 



It is always between and log N and attains its maximal 
value when all probabilities are equal: pi — 1/N. For a 
continuous random variable with PDF f{x) the entropy 
of information must be defined relative to some given ref- 
erence probability density function p{x) as [18) 



the consequence of the external variation of the input in 
repeated mesurements. We therefore consider instrument 
input X as a random variable and describe its PDF by the 
mean over the set of experimentally obtained distributions H— j ^^f[x)\og f{x) dx \og2L. (8) 



p{x) 

We will use as the reference the uniform density p{x) 
\/2L over the instrument range for which we get 
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With this formula we first express the uncertainty of the and the first integral on right of Eq.[T2] can be approxi- 
instrument calibration as mated as 



Hu^- [ ip {x, u) log 'il;{x,u)dx~ log 2L. (9) ' Ti '^p{x,x^)log^—Y,i'ix,x^)\dx 



For (T ^ L wc obtain from the Gaussian scattering func- 



L 



logiV— / tp(x,xi)logip{x,Xi) dx (13) 



tion X,) = g{x -x,,a) the approximation ^^.^ ^.^^^^ ^ ^ ^^^^ distributions are 



a 1 



L 2 



TT 



iI„«log- + - 1 + log- , (10) 



overlapping, but not concentrated at a single point, the 
inequality < / < logA^ holds. As the same relation is 
which shows that the uncertainty of calibration depends characteristic of the entropy of information for a discrete 
only upon the ratio of scattering width 2a and the in- ^^^^^^ variable, the experimental information has a sim- 
strument span 2L. The number log{a/L) determines the -^^^ meaning to that of the entropy of information for a 
lowest possible uncertainty of measurement on the given ^-^^^^^^ describes how much information is pro- 

instrument, as achieved at its calibration. ^.^^^ ^ ^^^-^^ ^ experiments performed by an instru- 

The indeterminacy of the random variable X, which j^g^t with the density of scattering distribution ^j{x,x,). 
characterizes the scattering of experimental data, is de- t^us intcrprete / as a measure of the complexity of 

fined by experimental data. 

According to the above analysis N repeated experi- 

fNix)logfMix)dx~log2L (11) 
^ ments can at most provide Imax = logN of informa- 

and is generally greater than the uncertainty of calibra- tion and this happens when the distributions i;{x,x,) are 
tion. We define the experimental information about X by non-overlapping. Since some overlapping normally takes 
the difference place, the actual experimental information / is smaller 

than Imax- In such a case the measurements do not give 

I ^ He - Hu ^ - / fN{x) log fN{x) dx 



J-L 

+ / %lj{x,u)logil;{x,u) dx. (12) 



the maximal possible information, which means that char- 
acterization of the probability distribution by N experi- 

i i • 1 1 • 1 1 ^1 1 ments is to some extent redundant. Accordingly, we define 

I'or a measurement that yields a smgie sample xi the prob- 

1.1., 1 ■ ■ u -c / \ // \ u ii • i- 1 the redundancy of experimental observation by the differ- 

abihty density is given by Ji{x) = ip[x, xi), both integrals j j 

in Eq.[T2] are equal, and the experimental information / ^^'^^ 

is zero. For a measurement which yields multiple samples ^ ^max I- (14) 

xi, . . . ,xjv that are mutually separated by several a, the This definition is based only on available experimental 

distributions ^p{x,Xi) — g{x ~ Xi, a) are non-overlapping data, therefore R can be determined experimentally at 
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each step of data acquisition. It should be pointed out that cr = 0.05 and 0.25. The resuhs obtained with three dif- 

ou definition differs from the common definition of the re- ferent sample sets demonstrate the statistical variation of 

dundancy in terms of mutual information which requires empirical information. In both cases the convergence of 

specification of joint probability distribution of variables experimental information to a fixed value is observed and 

that describe the data samples. [12ll5ll6ll7j the limits Koo ~ 50 and Koo ~ 10 are approximately es- 

If the standard deviation a of scattering is decreased timated. As could be expected, for both cases they are 

by improving the experiment, the redundancy is reduced equal to the ratio s/cr. Similar results were also observed 

and tends to along with a. With an increasing num- for the uniform PDF and for mixtures of normal PDFs. 

ber of samples the overlapping of distributions ilj{x,Xi) The displacement between the maximal possible exper- 

on the average increases and due to this overlapping / imcntal information Imax = logiV and other curves in 

increases more slowly than Imax — log TV and tends to Fig.[T]is the redundancy of observation, 
a certain value /oo with increasing N . Consequently, the 

redundancy increases on the average with the number of 3 Cost function and an optimal number of 

samples. Accordingly, the experimental information / can 

samples 

be interpreted as a characteristic which determines the 

, 1 • j-i-Ui- ij With an increasing number of experimental samples the 

number K 01 non-overlappmg distributions that could rep- o t r 

, . ,11 rni • u • 1 empirically estimated PDF converges to a function 

resent the experimental observation. Ihis number is de- j o 

1 ^ 

fined by f^^^) ^ lini ^ g(a, _ 5,, a), (16) 

K = e^ (15) 



i=l 

which we consider as the hypothetical PDF of variable X. 



and can be determined from experimental data and the Since it can not be determined by repetition of experi- 

scattering function tp. Asymptotically K tends to a value ments, we must decide when to stop the experimentation. 

Koo, a characteristic, which can be estimated quite accu- From the analysis of the properties of Parzen's estima- 

rately from a finite number of experiments. tor [1 - Eq. 4.19 ] we obtain the estimate for the vari- 

We illustrate the above-mentioned properties by us- ance Var[/Af(a;)] < [sup g(a;)]^/A^, which is applicable if 

ing a normal random variable X with standard deviation the accuracy of estimation is prescribed. When the accu- 

s — 2.5. In order to render possible a simple setting of its racy is not prescribed, the inequality only indicates that 

properties in illustrated examples, the samples Xi were N should be increased in order to decrease the variance; 

generated by a computer. Fig.[T] shows the dependence but with increasing N the redundancy increases and we 

of the experimental information on the number of sam- should consider both properties when deciding about a 

pies for two cases of Gaussian instrument scattering with proper number of samples N. With this aim we utilize 
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two estimators comprising N and T samples. The estima- Figure[3] shows that /a?^ is a rather coarse estimator 

tor with T samples is introduced as a reference by which of probability density. The reason for this property can 

we estimate the prediction error of the estimator with N be explained if the variation of the estimation error and 

samples. Consequently, fx should estimate foo with much redundancy term in the cost function with increasing N 

greater accuracy than /jv and therefore we take T ^ N. is considered. When N increases the minimum of C is 

We then describe the estimation error by the KuUback- achieved at a low number of samples because of increas- 

Leibner information divergence [2] ing redundancy; hence the estimation error need not be 



D 



^ r -1 fN{^) negligible but just properly counter balanced by the re- 

fNix) - frix) log — — - dx (17) 

-L L ^ JT(X) 

dundancy. This further means that a low number of func- 

and define the information cost of f^r relative to by , . . \ ^^ , i 

tions g(a; — qi,(J) with a small a cannot very accurately 

C = D + — Rt- (18) represent a broad and smooth function fxix). 

The dependence of C on with T — lO"* is shown in 

4 Generalized PDF model 

Fig. m for the same data as in the case of Fig.[TJ The num- 
ber No at which the cost C is minimal is to be considered If we want to improve the representation of the PDF by 
as the proper number of samples for modeling of PDF. a small number of functions we evidently may not keep 
It depends on the samples used in estimator and we a fixed. For this purpose we change the estimator of /nq 
statistically determined No — 35 ± 20 for a = 0.25, and into a general mixture model 
No = 218 ± 64 for a = 0.05. The relatively large statis- 
tical scattering of No is a consequence of the very slow, 
approximately logarithmic divergence of the redundancy. 
The number No also depends on the sample set used in 
estimator fx, but if T is much greater than No, its influ- 
ence is negligible in comparison with statistical scattering. 
Fig.g] shows an example of the estimated probability den- = ^ / ^Pi'^-^^ (^) log {x) dx - log 2L, (20) 



/A/(a^) ^^KV-zla^) (19) 
1=1 

by using M basis functions ipiix) = g{x — qi,(7i) and ad- 
justable parameters qi, ai and p.;. We define here the en- 
tropy of basis functions and the information content of the 
model as means over probabilities pj 



sity /7v„ for a = 0.25, No = 46 and C(No) = -4.8 nat. For 1m = Hu - Hs (21) 



the purpose of comparison, is also shown in Fig.O Our = _ y "^y^/Pi ipiix) log 

examples show that the proper number No is several times 



ipi{x) 



dx. 



greater than Koo- Since can be simply calculated. No 
can be roughly estimated also without calculation of the 
cost function. 



1=1 

The model redundancy end estimation error are then 

i?M = log Af- /a/, (22) 



Dm = 



fM{x)^Mx)]log^fj^dx. (23) 

-I fT{x) 
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With these characteristics we define the information cost by selecting M = T and assigning the values qi — Xi, 

of the model relative to experimentally estimated frix) en — a, pi — 1/T to parameters of basis functions. Af- 

as ter that we consider a model with M = T — 1 terms. 

Cm = Dm + Rm — Rt- (24) If we try to determine the parameters of the compressed 

If we want to adapt the model Eq.[llto experimental "lo^el by a strict mathematical procedure based on min- 

data we must specify the number M and parameters g„ ionization of the cost function Ct_i, we obtain a set of 

a„ of basis functions. [2] We cannot achieve this by the non-linear equations that is difficult for further treatment, 

variation method since M is an integer number. [3, Vari- ^ess rigorously, but physically more sensibly, we proceed 

ous methods of growing and pruning have been developed assuming that an improved model can be obtained by 

for this purpose in the field of neural networks. [aZUl] The compressing z-th and fc-th term determined by p,,q,, a, 

growing methods are mainly utilized when the model is 9^'' i^*" ^"^§1^ J"*^ t^™^ ^i*^ parameters = 



adapted to an increasing number of experimental samples, 



Pi + Pk, Qj = [PiQi +Pkqk)/Pj, and (jj = [a'^pi/pj + 



while pruning is used when a large number of experimental ^fc ^^^M + - Qk? PrPk/p,]'/^ , that represent the com- 

samples is compressed to a smaller number of representa- "^^^ probability, center of gravity and standard deviation, 

tive data. In any case a decision about the creation or respectively; consequently, the total probability and the 

annihilation of model terms must be reached, based upon ^'^^^ moments of the probability distribution are pre- 

some criterion. In the literature various criteria have al- ^^^^^^i- ^he terms are actually compressed only if the cost 

ready been proposed, ranging from purely heueristical to Unction is decreased. In the case of just two terms with 

strictly theoretical ones, but at present there is still no ^^^^i^l probabilities and widths it was found numerically 

generally accepted method. 7, In our treatment we decide ^hat they are compressed only if their centers are sepa- 

to change the number of basis functions in the model if ^'^ted by less than approximately 3a. The procedure is 

the cost function Cm is decreased by such action. With ^^en iterated on all terms of the model until all possi- 

this criterion we tested first the annihilation process and ^le compressions are carried out. Fig.H shows a result of 

then a combined creation-annihilation process, which are ^^is procedure for a bi-modal PDF. From the function 

described in the following subsections. f^, determined by 10^ experimental data, we obtain, after 

compression, the model with just two basis functions and 

4.1 Model optimization by annihilation of terms significantly reduced redundancy. The agreement between 

the experimentally estimated /t and the model function 



Consider the case when the function fx is determined 
by an extensive set of redundant experimental data. We 
start the adaptation of the model Eq.[Tn] to these data 



/m is determined by the prediction error D]\j = 0.01 nat, 
while Cm — 0.15 nat describes the information cost of 
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such a representation. In this case the cost is mainly de- function is compared with fx as determined from a large 
termined by the redundancy — 0.14 nat, which is a number of samples, while in the creation-annihilation pro- 
consequence of the overlapping of model basis functions. cess two successive model functions are compared. Since 

comparison of model functions can lead to accumulation of 
4.2 Model adaptation by the creation-annihilation errors, one could generally expect smaller modeling error 

process when using the annihilation process. 

, , , , , , , ..... , -1 -1 i- • ■ 1 On average the number of model terms in the creation- 

Aithough model optimization by annihilation is simple, 

, • i • ii i 11 • i 1 1 i i 1 annihilation process initially increases and subsequently 

its weak point is that ail the experimental data must be c- j 

, , r ii i i r 1 i i- n i -i ■ ri dccrcascs with the number of acquired experimental sam- 

acquired before the start of adaptation. But it is often con- ^ ^ 

, ^ r ii 1 1 • ii 1 -ii • -i- pies. Therefore, it is instructive to follow the development 

venient to form the model simultaneously with acquisition 

r ■ , 1 J , T ii ■ j-1 • ii J of the model with an increasing number of samples. Fig. [5] 

of experimental data. In this case the compression method ° ^ t> i— i 

,j ,.,1 , r jr. 1 • •,• i u i- r shows the result obtained during the adaptation of the 
could still be perlormed alter each acquisition step, but lor ° '- 

. -jji iUij model to bi- modal PDF of Fig. m At each acquisition time 

this purpose all previously acquired data must be stored. 

, , r • 1 J 1 1 r the position of the sample xt is marked by a star, while 

We therlore propose a more economical method whereby r ^ j ? 

, 1 , , i 1 A i m 1 the centers of basis functions o, are marked by bullets 

less numerous model parameters are stored. At J = 1 ^ 

^ , 1 1- 1 J- / \ / NAT which may merge into lines. In the initial phase of the 

we start modeling by setting ji(x) = g[x — xi,a). At- jo 

, ..... ^, ^ , model adaptation several basis functions are created and 

ter each acquisition step we then create a new term with 

, 1 /m 1 • 1 J -i in the later phase some of them are annihilated until ul- 

the parameters xt, ctt — c, Pt ~ ^1 i- and include it ^ 

■ . iiri-u ■ -lij timately an optimal model structure is established. After 

m the previous model lunction by using weighted average ^ 

c , \ , \ irr , t I \/rr T\ irr ■ c that the parameters of the model are less and less influ- 

jT\x) = g(x — XT, cr)/-i + jT-i[x)[-L —1)1 1 . On this tunc- ^ 

^. ^, . . ^, „ , i 1 i enced by new experimental data. Annihilation of model 

tion the compression is then perlormed. i he created term 

, -1 -1 i 1 -r ii -J 1 r 11 1 terms generally keeps the number M of model functions 

IS either annihilated, il the acquired sample xt tails close ° j f 

^ ^, . r r ii 1 • r i- ii i • bclow thc numbcr T of samples. Consequently, for large 

to the center of one ot the basis lunctions that comprise i ji o 

1 , . 1 jj-i- 1 i r il T the storage of model parameters usually requires sig- 

the model, or is preserved as an additional term oi the ^ j ^ a 

J , xTTT-.i • ■ rn ,^ j-c r i-1 titm^ nificantly less memory space than the storage of all the 

model. With increasing I the modmcation ot the Fuh ^ ^ ° 

, • i 1 1-1 J 1 J experimental data, and the resulting parameters of the 

by new experimental samples is less and less pronounced. ° 

r ii • 1 -il ii 1 ii i model can often be related to basic processes underlying 

When we perlorm this procedure with the samples that o 

1 . ,1 ,. rriiu li- J 1 the investigated phenomenon, 

were used m the preparation ol i ig.|4| the resulting model ° 

PDF agrees with the function, which was obtained by the The general mixture model quite often exhibits signif- 

annihilation process. In the annihilation process the model icantly lower redundancy than the experimental model of 
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Eq.[5l For example, after compression of the experimen- are evenhandedly treated in the cost function. If the width 
tal data which determine fx of Fig.[3l we obtained just of basis functions is determined by experimental scatter- 
one term with qi and cti determined by the sample mean ing only, then the model yields a rather coarse estimate 
and standard deviation of variable X. These represent a of PDF. The quality of the estimate can often be signifi- 
non-redundant optimal model of the hypothetical PDF. cantly improved by using the generalized mixture model. 
A similar conclusion holds for the model of the bi-modal The adaptation of the mixture model leads to an effective 
PDF of Fig.m PDF estimator that is applicable in automatic measure- 

ment systems. The creation-annihilation process described 
5 Conclusions ^l^o represents a new approach to modeling of artificial 

neural networks. [2J In this case the modeler represents a 

We have shown how the PDF of a scalar variable can be 

dynamic system with adaptable parameters which are in- 
estimated non-parametrically by taking into account the 

fluenced by the experimental data. Evolution of the model 

inaccuracy of measurements. By the properties of the PDF 

terms by creation and annihilation resembles condensa- 

estimator we have defined the experimental information 

tion processes in vapors or evolution of grains in alloys 

and redundancy of data. Even though the same definition 

and is a typically non- linear, self-organized phenomenon. 

can be performed with a multivariate variable, the analy- 

This analogy indicates the possibility of optimal modeler 

sis is less comprehensible since the number of parameters 

description by statistical physics and synergetics. 

in the scattering function increases. We have not specified 
the form of the scattering function based on fundamental 

principles, but the central hmit theorem of probabihty in- ^ Acknowledgment 
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Fig. 4. Probability density function /m (solid line) adapted to 
/t (dashed line) by the compression of basis functions in the 
model. 



